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Abstract 

This paper develops a bias correction scheme for a multivariate normal 
model under a general parameterization. In the model, the mean vector and 
the covariance matrix share the same parameters. It includes many impor- 
tant regression models available in the literature as special cases, such as 
(non)linear regression, errors-in-variables models, and so forth. Moreover, 
heteroscedastic situations may also be studied within our framework. We 
derive a general expression for the second-order biases of maximum likeli- 
hood estimates of the model parameters and show that it is always possible 
to obtain the second order bias by means of ordinary weighted lest-squares 
regressions. We enlighten such general expression with an errors-in-variables 
model and also conduct some simulations in order to verify the performance 
of the corrected estimates. The simulation results show that the bias correc- 
tion scheme yields nearly unbiased estimators. We also present an empirical 
ilustration. 

Key Words: Bias correction, errors-in-variables model, maximum likelihood 
estimation, multivariate regression. 

1 Introduction 



Applications of multivariate normal models are commonly found in the literature. 
There are simple models that do not require asymptotic approximations for the 
maximum likelihood estimators. Nevertheless, in the majority of problems, the es- 
timation procedure in such multivariate normal models embrace on the asymptotic 
theory. For instance, it is hard to compute the exact distribution of maximum like- 
lihood estimators (MLEs) for nonlinear multivariate regressions, errors-in-variables 
models and many others when the sample size is finite (practical situations). Then, 
in the practical applications, the asymptotic distribution of the MLE is used as 
an approximation to its exact distribution. It considerable simplifies the inferen- 
tial process. In general, under some regularity conditions, the MLEs are consistent 
and efficient, i.e., asymptotically, their biases converge to zero and their variance- 
covariance matrices approach the inverse of the Fisher information. Moreover, under 
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such regularity conditions, the MLEs are asymptotically normally distributed. Al- 
though the MLEs have these important features, they may be strongly biased for 
small or even moderate samples sizes when more complex models are considered. 
Thus, a bias correction can play an important role to improve the estimation of the 
model parameters. 

An important area of research in statistics is the study of the finite-sample be- 
havior of MLEs. It is well konwn that MLEs are oftentimes biased, thus displaying 
systematic error. This is not a serious problem for relatively large sample sizes, since 
the bias is typically of order C(n _1 ), whereas the asymptotic standard errors are 
of order 0(n~ l l 2 ). However, for small or even moderate values of the sample size 
n, bias can constitute a problem. Thus, availability of formulae for its approximate 
computation is important for accurate estimation of many models that are used in 
a number of applications. Bias correction of MLEs is particularly important when 
the sample size, or the total information, is small (Vasconcellos and Cribari-Neto, 
2005). 

Bias adjustment has been extensively studied in the statistical literature. Box 
(1971) gives a general expression for the rT x bias in multivariate nonlinear models 
where covariance matrices are known. For nonlinear regression models, Cook et al. 
(1986) relate bias to the position of the explanatory variables in the sample space. 
Cordeiro and McCullagh (1991) give general matrix formulae for bias correction in 
generalized linear models. Cordeiro and Vasconcellos (1997) obtained general matrix 
formulae for bias correction in multivariate nonlinear regression models with normal 
errors, while Vasconcellos and Cordeiro (1997) obtained general formulae for bias in 
multivariate nonlinear heteroscedastic regression. Also, Cordeiro and Vasconcellos 
(1999) obtained second order biases of the maximum likelihood estimators in von 
Mises regression models, while Cordeiro et al. (2000) obtained bias correction for 
symmetric nonlinear regression models. Vasconcellos and Cordeiro (2000) obtained 
bias correction for multivariate nonlinear Student t regression models. More recently, 
Vasconcellos and Cribari-Neto (2005) obtained bias correction in a new class of beta 
regressions. Cordeiro and Demetrio (2008) derive formulae for the second-order 
biases of the quasi-maximum likelihood estimators, while Cordeiro and Toyama 
(2008) derive general formulae for the second-order biases in generalized nonlinear 
models with dispersion covariates. 

In this paper we study a multivariate normal model with general parameteriza- 
tion and derive the second-order biases of the maximum likelihood estimates. Here, 
the general parameterization means a sort of unification of several important models 
which can be constructed using the multivariate normal model. For instance, the 
multivariate nonlinear regressions studied by Cordeiro and Vasconcellos (1997) and 
their heteroscedastic version (Vasconcellos and Cordeiro, 1997) are just particular 
cases of our proposal. In this paper we propose a model in which the mean fi and the 
variance S of the observed variables are indexed by the same vector of parameters, 
say 6. The existing works on bias correction assume that the mean and variance do 
not share any parameters, however, in errors-in-variables models, for example, this 
assumption is not realistic. Indeed, that assumption makes the computation of the 
bias formulae less complicated, but it restricts the applicability of the approach to 
a special class of models. In view of that, the main goal of this article is to extend 
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the bias correction to a wide class of multivariate models which has not yet been 
considered in the statistical literature. 

The outline of the paper is as follows. Section 2 presents the main model and 
computes the first three derivatives of the log-likelihood function and their expec- 
tations. In Section 3, we present matrix formulae for the second-order biases of the 
MLEs for the general model. In Section 4, we present some useful examples of the 
proposed formulation. Monte Carlo simulation results are presented and discussed 
in Section 5. The numerical results show that the bias correction we derive is ef- 
fective in small samples; it delivers estimators that are nearly unbiased and display 
superior finite-sample behavior. Section 6 contains an empirical ilustration. Finally, 
Section 7 concludes the paper. 

2 Model specification 

We consider the situation in which n independent multivariate random variables 
Yl, . . . , Y n are observed and the number of responses measured in each observation 
is q. We also admit that auxiliary covariates can be observed, say Xi, . . . , X n . The 
multivariate regression model can then be represented as 

Yi = Hi(0) + Ui, i = l,2,...,n, (1) 

where Y is a q x 1 vector of dependent variables, Hi(0) = A*j(0, Xi) is a mean func- 
tion (the shape is assumed known) which is three times continuously differentiable 
with respect to each element of and Xi is an m x 1 vector of known explanatory 
variables associated with the i th observed response Y- Also, is a p x 1 vector 
of unknown parameters of interest. Additionally, as the foundation for estimation 
by maximum likelihood and hypothesis testing, we assume that the independent 
random errors Uj's follow a multivariate normal Af q (0, distribution, where 

is a q x q nonsingular covariance matrix and the entries of are assumed 

three times continuously differentiable in each element of 0. We are assuming, in 
addition, that the functions fii(0) and are defined in a convenient way since 

must be identifiable in model (1). 

The class of models presented above is very rich and includes many important 
regression models. For example, in an errors-in-variables model we observe two 
variables, namely Y and X± whose relationship is given by 

Yj = a + f3xi + ti and Xi = Xi + u iy (2) 

where Xj ~ J\f((j, x ,al), ~ A/"(0,o" 2 ) and Ui ~ A/"(0,<7 2 ), with <r 2 known and, 
additionally, Xi, and Ui are mutually uncorrelated. Then, denoting Y = (Y, Xi) T 
and = (a, (3, fx x , a 2 x , a 2 ) T we have that Y ~ A2(/*(0), S(0)), where 

This is a simple linear regression in which the covariate is subject to measure- 
ment errors. This is a good example where the usual approach (assuming that X 
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and \x do not share any parameter) is not applicable. Measurement error models 
have been largely used in epidemiology (Kulathinal et al., 2002; de Castro et al., 
2008), astrophysics (Akritas and Bershady, 1996; Kelly, 2007; Kelly et al, 2008) 
and analytical chemistry (Cheng and Riu, 2006) to avoid inconsistent estimators 
(see Fuller, 1987, for further details). Other special cases of model (1) are: multi- 
variate heteroscedastic nonlinear errors-in-variables models, multivariate nonlinear 
heteroscedastic models, univariate nonlinear models, factor analysis, mixed models 
and so on. As can be seen, model (1) can encompass a wide class of models. 

To simplify the notation, define Y = vec(Yi, Y 2 , . . . , Y n ), /x = vec(//i(0), . . . , /x n (0)), 
S = diag{£i(0), . . . , E n (0)} and u = Y — fi, where vec(-) is the vec operator, which 
transforms a matrix into a vector by stacking the columns of the matrix one un- 
derneath the other. Then, the log-likelihood associated with (1), apart from an 
unimportant constant, is 

£(0)oc-ilog|£|-itr{E-W}. (3) 

We make some assumptions (Cox and Hinkley, 1974, Ch. 9) on the behavior of 
£(9) as the sample size n approaches infinity, such as the regularity of the first 
three derivatives of 1(6) with respect to 9 and the uniqueness of the MLE of 9, 
9. We now introduce the following total log-likelihood derivatives, in which the 
indices r, s and t range from 1 to p. Let U r = d£(9)/d9 r , U sr = d 2 £(9) /d9 s d9 r 
and Utsr = d 3 £(9)/d9 t d9 s d9 r be the first three derivatives of £(0). The standard 
notation for the moments of those log-likelihood derivatives is used (Lawley, 1956), 
namely: K sr = E(U sr ), k, %t = ~E(U s U r ), K tsr = E(C/ tsr ) and so on. Furthermore, 
we define the derivative of n sr with respect to 9 t = dK sr /d9 t . Not all «'s 

are functionally independent; e.g., k s ^ t = —K sr , which is the typical element of the 
information matrix Kg, assumed to be nonsingular. All «'s refer to a total over the 
sample and are, in general, of order n. Finally, let n s ' r denote the corresponding 
element of Kg 1 . 

Define the following quantities: 

&tsr "777 o/i 77T - •> 7T7 - 1 *- / sr 



Mr d0 a d0r d9 t d9 s d9r d9 r ' d9,d9 r ' 

~ 89 t d9 s d9 r ' Ar -^r- _5] ° rE and Asr "^7' 
where r, s, t = 1, 2, . . . ,p. We assume that these derivatives do exist. To compute 
the derivatives of £(9) we make use of methods in matrix differential calculus, as 
describe in Magnus and Neudecker (1988). Thus, the first derivative of (3) with 
respect to the r th element of 9 is 

U r = ^tr{A r (S - uu T )} + tr{£-V« T }. 

By using some simple matrix properties, the score function for 9 can be written in 
matrix form as 

Ug = Ug{0) = D Y r 1 ™ - iy T S- 1 vec(S - uu T ), 
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where D = (oi, . . . ,a p ), V = (vec(Ci), . . . , vec(C p )), E = E <g> E and ® is the 
Kronecker product. Let 



77, 



F=(~Y // and 77 f " T , )■ (1) 

Ivi V° 2S / \-vec(Z - uu T )J 

Then, note that the score function can be written as 

U e = F T Hu. 

The second and third derivatives are given, respectively, by 

U sr = itr{(A s SA r + A r EA s - E _1 C sr E _1 )(E - uu T )} 
2 

+ ]-ti{A r (C s + a s u T + uaj)} + tr{(A s a r + E _1 a sr )it T - T,~ 1 a r aJ} 



and 
where 



77 - TfO-) , rr(2) , 7-/(3) , rr(4) 
^tsr — <-7 sr "T u tsr u tsr ' u tsr i 



U£l = -^{(E-^E- 1 + E^C^A + A t C sr S" 1 )(S - uu T ) 
+ E^C^E^C* + S _1 C ir E- 1 (a t « T + M t T )}, 

C/S = ^tr{(A st EA r + A s {C t A r + EA rt ))(E - uu T ) 

+ (A rt EA s + A r (C t A s + EA st ))(E - uu T ) 
+ (A s EA r + A r EA s )(Ct + a t u T + uaj)}, 

Uj® = ^tr{A rt (C s + a s u T + uaj) + A r (C ts + at s 7j, T - a s aj + a t aj + ua^)} 
and 

U^l = tr{(A st a r + A s a rt + A t a sr + E _1 a srt )w T - (A s a r + E~ 1 a sr )a7} 
- tr{E _1 a r a^ + A t a r aJ + E _1 a s a^}. 

Note that E(u) = and E(uu T ) = E. Knowing these properties, the expectation 
of U r , U sr and C/ tsr are easily obtained. The quantities n sr , K tsr and njj (r,s,t = 
1,2, ... ,p) are given, respectively, by 

n sr = ^tr{A r C s } - ajE _1 a r , (5) 

n tsr = tr{(A r EA s + A s TiA r )C t } + \ti{A s C tr + A r C ts + A t C sr } 

I (b) 

— (a t A s a r + a s A t a r + a s A r a t + a t E~ a sr + a ts E~ a r + a s S~ a tr ) 
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and 

k£> = itr{(A r SA s + A s T,A r )C t + A t C rs + A s C rt } 
— (a^X _1 a s + aj A r a s + a^E^a^). 

Again, by using some matrix properties on expression (5), we can written the 
expected Fisher information as 

K g = K e {0) = iTST 1 !) + -V^^V. 

Using F and H given in (4), we can write K e in the form 

K e = F T HF. (8) 

The MLE 6 satisfy the equation Uq = 0. After some matrix manipulations, the 
Fisher scoring method can be used to estimate by iteratively solving the equation 

(fWTgHfH) (™+i) = fMTgMj.(m) i m = 0, 1, 2, . . . , (9) 

where u*^ = f H^H+uH, Each loop, through the iterative scheme (9), consists 
of an iterative re- weighted least squares algorithm to optimize the log-likelihood (3). 
Using equation (9) and any software (MAPLE, MATLAB, Ox, R, SAS) with a weighted lin- 
ear regression routine one can compute the MLE 6 iteratively. It is also noteworthy 
that the MLE in even much complex models, such as multivariate heteroscedastic 
nonlinear errors-in- variables models, may be attained via iterative formula (9). 

It is well known that MLEs are consistent, asymptotically efficient and asymp- 
totically normal distributed. We can write ~ N p (6,K e 1 ), when n is large, ~ 
denoting approximately distributed. Hence, hypotheses testing can be carried out 
using this asymptotic distribution. 

3 Biases of estimates of 

In this section we compute the biases of ML estimates of 6 for models defined by (1). 
Let B(9 a ) be the n~ l bias of 9 a) a = 1,2, ... ,p. It follows from the general expression 
for the multiparameter n~ l biases of MLEs given by Cox and Snell (1968) that 

= ^'^W 2 Ktsr ~ Kts A> 

t,s,r ^ ' 

where ^2 denotes the summation over all combinations of the parameters Q\, . . . , 9 p . 
Following Cordeiro and Klein (1994), we write the above equation in matrix notation 
to obtain rT x bias vector B{0) of 6 in the form 

B{6) = K^WveciK, 1 ), 
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where W = (W^\ . . . , W<p>) is a p x p 2 partitioned matrix, each referring to 
the r th component of being a p x p matrix with typical (t, s)th element given by 
w ts = \ K tsr + i^ts,r = k(s ~ \ K tsr- Notice that from (6) and (7) we have that 

w£ = hr{A t C sr + A s C tr - A r C ts } 



- ^(aJX^asr + oJS-V - aJU^ats) (10) 



+ ]-(a]A t a r + a t T A s a r - ajA r a s ). 

Since .Kg is a symmetric matrix and we are interested in the multiplication result 
of Wvec(KQ 1 ), many terms of (10) cancel. Indeed, note that the t th element of 
WveciKg 1 ) is given by w^k 1 ' 1 + (w$ + w^k 1 ' 2 + ■■■ + (w { t f + w^)K 9 ' r + ■■■ + 
(w^+w^^kP-^+w^k™ and wg+wg = |tr(A t C ir )-a t T S- 1 a ar +ojA t a r . 
Therefore, we can replace the element w£) by jtr(A t C sr ) — |a7S _1 a sr + |ajA t a r 
and may be written in an equivalent way as = F H<fr r , r = 1, . . . ,p, 

where <fr r = — \{G r + J r ) with 



vec(C ir ) • • • vec(C 3 



pr j 



and J r = 





2(I nq ®a r )D 



where I m denotes an m x m identity matrix. That is, the matrix W can be written 
as W = F T H(&i, . . . , $ p ). Then, we arrive at the following theorem. 

Theorem. The n~ l bias vector B (6) of is given by 

B(0) = (F^HF)- 1 ^!!^ (11) 

where 1= . . . , $ p )vec((F T HF)- r ). 

In order to interpret formulae (11) it is helpful to exploit the relationship between 
the n _1 bias of and a linear regression. The bias vector B(0) is simply the set 
coefficients from the ordinary weighted lest-squares regression of £ on the columns 
of F, using weights in H . As expression (11) makes clear, for any particular model 
of the class of models presented in Section 2, it is always possible to express the bias 
of as the solution of an ordinary weighted lest-squares regression. Equation (11) 
is easily handled algebraically for any type of nonlinear model, since it only involves 
simple operations on matrices and vectors. This equation, in conjunction with a 
computer algebra system such as MAPLE (Abell and Braselton, 1994), will yield B{0) 
algebraically with minimal effort. Also, we can compute the bias B(0) numerically 
via software with numerical linear algebra facilities such as Ox (Doornik, 2006) and 
R (R Development Core Team, 2006). [Note that we have described a procedure 
to attain a corrected estimator in a general formulation that covers a wide class of 
models. In the next section we shall present some special cases to shed light on the 
applicability of our general formulation.] 

Therefore, we are able to compute the n~ l biases of the MLEs for the general 
model (1) using formula (11). On the right-hand side of expression (11), which is 
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of order n 1 , consistent estimates of the parameter 6 can be inserted to define the 
corrected MLE 6 = — B(0), where B(-) denotes the value of B(-) at 6. The 
bias-corrected estimate 6 is expected to have better sampling properties than the 
uncorrected estimator, 6. In fact, we present some simulations in Section 5 that 
indicate that 6 has smaller bias than its corresponding MLE, thus suggesting that 
the bias corrections above have the effect of shrinking the modified estimates toward 
to the true parameter values. 

Following Cordeiro (2008), it is worth emphasizing that there are other methods 
to bias-correcting MLEs. In regular parametric problems, Firth (1993) developed 
the so-called "preventive" method, which also allows for the removal of the second- 
order biases. His method consists of modifying the original score function to remove 
the first-order term from the asymptotic bias of these estimates. In exponential 
families with canonical parameterization his correction scheme consists in penalizing 
the likelihood by the Jeffreys invariant prior. This is a preventive approach to bias 
adjustment which has its merits, but the connections between our results and his 
work are not pursued in this paper since they will be developed in future research. 
Additionally, we should also stress that it is possible to avoid cumbersome and 
tedious algebra on cumulant calculations by using Efron's bootstrap (Efron and 
Tibshirani, 1993). We use the analytical approach here since it leads to a closed- form 
solution and we do not need to run extensive numerical resamples. Moreover, the 
application of the analytical bias seems to generally be the most feasible procedure 
to use and it continues to receive attention in the literature. 



4 Special models 



It is useful to consider some examples to illustrate the applicability of the results in 
the previous section and clarify the notation used. Other important special models 
could also be easily handled since formula (11) only requires simple operations on 
matrices and vectors. 

First, consider a univariate nonlinear model (q = 1) in which S = a 2 I n . Note 
that this model is a particular case of model (1) with = ({3 T ,a 2 ) T and fi = 
(fj,i(/3), . . . , /i„(/3)) T , where the components of n and £ are unrelated and vary 
independently. Let p — 1 be the dimension of f3. Here, D = (cti, . . . , a p _i, 0) and 
V = (0,vec(C p )). Also, F = diag^ 1 ), V (2) }, where £>W = (oi, . . . , a p _i) and 
V"( 2 ) = vec(Cp). Then, from (8), the expected Fisher information for 6 can be 
written as K e = F T HF = diagfK^, K a 2}, where Kp = D^ T D^/a 2 is Fisher's 
information for (3 and K c i = n/2a A is the information relative to a 2 . Since Ke is 
block-diagonal, (3 and a are globally orthogonal (Cox and Reid, 1987). From (11), 
note that 

^(i)T^(i))-i j 5(i)T o 

±V^{I n ®I n ) 



[F T HF) 1 F T H 



Also, 
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where G = (a pi, . . . , o^-i)) with ap k = (a lk , . . . , a ip _ 1)k ) and K^ k l is the k 
column of Kp l - Then, 



th 



B(0) 



B((3) 

Bid 2 ) 



( J D(i)T jD (i))-i.D(i)T£ 1 



Note that B0) = (D^ T D^)" 1 1)W T & agrees with the result due to Cook et 
al. (1986, Equation (3)). Additionally, we obtain the following simple form originally 
first given by Beale (1960): B(a 2 ) = — (p — l)a 2 /n; note that 

p— l p— l 

y( 2 ) T (/ n <g> J n ) J](J n <g> a k )bVKpl = vec(C p ) T (I„ ® a k )D^ K~l 

k=l k=l 

= ^tr{a k K,lb^} = tr{£)« V£> (1)T } = (P " ^ ■ 

k=i 

As a second application, consider the multivariate nonlinear heteroscedastic re- 
gression model studied by Vasconcellos and Cordeiro (1997). Note that this model 
is a particular case of model (1), with 6 = (/3 T , cr T ) T , /x = vec(/ii(/3), . . . , /x n (/3)) 
and £ = diag{Si(cr), . . . , S n (cr)}. Therefore, the components of fi and £ are un- 
related and vary independently Let pi and P2 = Pj- Pi be the dimensions of (3 and 
cr, respectively. Here, D = (a,i, . . . , a pi , 0) and V = (0, vec(C Pl+ i), . . . , vec(C p )). 
Let = (ai, a 2 , . . . , a Pl ) and = (vec(C Pl+ i), vec(C Pl+ 2) . . . , vec(C p )), then 
F = diagj-D^, V^}. From (8), the expected Fisher information for 6 can be 
written as K e = F T HF = di&g{Kp, K^}, where Kp = D^E^DW is Fisher's 
information for (3 and = |V^ T S _1 is the information relative to cr. Since 
K is block-diagonal, (3 and cr are globally orthogonal. From (11), it follows that 




(r>(l)T S -l£)(l))-l£|(l)T S -l 

(V"(2)Tg-ly-(2)^-ly(2)Tg-l 



-iGvecjlDWTS- 1 ^ 1 ))- 1 } 



where G = (a^i, . . . , a^J with = (a ifc , . . . , a Plfc ) and W = (v a{pi+1) , v ap ) 
with v ak = (vec(C( Pl+ i)fc), . . . , vec(C pk )) and is the k th column of K^ 1 . There- 
fore, 



B(0) 



B{(3) 
B{a) 



( J D( 1 ) T S- 1 ^( 1 ))- 1 J DW T S- 1 6 

(V-(2)Tg-l V (2)j-ly(2)Tg-l^ 2 



Note that B(/3) = (.D( 1 ) T E- 1 .D( 1 ))- 1 .D( 1 ) T E- 1 £i agrees with the result due to 
Vasconcellos and Cordeiro (1997, Equation (3.2)). Additionally, note that B{S) = 
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(y(2)T S -i v (2))-i v (2)T S -i£ 2 algo reduces to Vasconcellos and Cordeiro's (1997) 
Eq. (3.8), since 



pi 



y(2)T E -i J2(I nq ® a k )D^K~ p l = V (2)T £ _1 vec(A*), 



k=i 



where A* is as defined by Vasconcellos and Cordeiro (1997, p. 148). 

Next, unlike the two models discussed previously, we consider a model where 
the elements of /x and £ are related and do not vary independently. Consider the 
nonlinear heteroscedastic errors-in-variables model 

Yi = a + (3xi + exp(7^) + d and Xi — Xi + u h 

where Xi ~ Af(fi x , &1) an d u i ~ -^(0, 0" 2 ) are the measurement errors with a 2 known 
and Ci ~ J\f(0, a 2 exp{i]Zi}) . The covariate Zi is known. In this example, the vector 
of parameters is 6 = (a, ft, 7, /i x , a 2 , a 2 , r/) T and the mean and variance functions for 
the i th observation (Y^Xj) are given by 



a + /3/i x + exp(7^)\ _ / (3 2 a 2 x + a 2 exp(rjZi) fta 



" i = l fe ) and E, -v W + 



v 2 



Then, 

a, = 1„®(J) , a 2 = , a 3 = vec { ( 2 ' . . . ( Z » e ^»> 

a 4 = l n <S> ^\ and a 5 = a 6 = a 7 = 0, 
where l n denotes an n x 1 vector of ones. Also, a rs = for all r, s except for 

«,^=W(J) and a 3 3 = vec{( 2 ' eX P(^))...( 2 » eX Ph 2 - 

Also, C r = for all r except for 



2(3a 2 x a 2 \ _ f (3 2 0\ n /expfa*) 



and 

„ « / ^o" 2 expfwz;) 0\ 

C '=,S( oj- 

where © is the direct sum of matrices. Additionally, C rs = for all r, s except for 

c 22 = i n ®(^f c 25 = c 52 = i n ® ( 2 f J), 

„ „ " f ZiexpiriZi) 0\ , „ n f z 2 a 2 expiriZi) 0\ 

C 67 = C 76 = ® and C 77 = e ™ . 
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Thus, D= (01,02,03,04,0,0,0), V = (0,vec(C 2 ),0,0,vec(C 5 ),vec(C 6 ),vec(C 7 )) 
and the matrix formula (11) can be used to compute the second-order bias for this 
model. Notice that, as vec(C 2 ) is not equal to zero, the derivation of algebraic 
expression using matrix formula (11) becomes very tedious, since the structure of 
Kq is not block- diagonal unlike the two previous examples. However, using MAPLE, 
for example, the derivation can be easily done. Also, the n~ x bias vector B(0) can 
be obtained numerically via software with numerical linear algebra facilities with 
minimal effort such as R and Ox. 



5 Simulation study 

We recall that, for large samples the biases of the MLEs are neglible. However, for 
small and moderate sample sizes the second-order biases may be large and can be 
used to improve the estimation. We shall use Monte Carlo simulation to evaluate 
the finite sample performance of the original MLEs and their corrected versions. 
All simulations were performed using R (R Development Core Team, 2006). The 
sample sizes considered were n = 15, 25, 35, 50 and 100, the number of Monte Carlo 
replications was 5,000. 

We consider the simple errors-in- variables model as described in Fuller (1987): 

Yi = a + f3xi + ei and = %i + u i: 

where Xi ~ Af(fi x , &1) an d u % ~ A/"(0, a 2 ) are the measurement errors with a 2 known 
and ei ~ A/"(0, a 2 ), with % = 1, 2, . . . , n. Here, = (a, (3, fi x , a 2 ,, a 2 ) T and 



M = l„»r+/M and S = 7„«( 8V jt ff2 J*. 



From the previous expressions, we have immediately that 

ai = l n <g>fjj, a 2 = l n ®y^\, a 3 = l n ®(^). ai = a r , = 



and a rs = for all r, s except for 

O23 — 032 = l n 
Also, C r = for all r except for 



C 2 = /„ ® a pj , C 4 = I n ® ^ ^ and C 5 = I n ® (J q 

Additionally, C rs = for all r, s except for 

C 22 = In ® ( 2 q' and C 24 = C 42 = I n (8) ( 2 f J 

Thus, £> = (ai, a 2 , a 3 , 0, 0) and V = (0, vec(C 2 ), 0, vec(C 4 ), vec(C 5 )). Therefore, 
all the quantities necessary to calculate B{0) using expression (11) are given. 
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In order to analyze the point estimation results, we computed, for each sample 
size and for each estimator: the relative bias (the relative bias of an estimator 9 is 
defined as {E(0) — 6}/6, its estimate being obtained by estimating E(0) by Monte 
Carlo) and the root mean square error, i.e., v / MSE, where MSE is the mean squared 
error estimated from the 5,000 Monte Carlo replications. Without loss of generality, 
the true values of the regression parameters were set at a — 67, (3 = 0.42, fi x = 70, 
a 2 = 247 and a 2 = 43. The parameter setting were choosen in order to represent the 
dataset (yields of corn on Marshall soil in Iowa) presented in Fuller (1987, p. 18). 
The known measurement error variance is a 2 = 57 (which was attained through a 
previous experiment). 

Table 1 gives the relative biases and VMSE of both uncorrected and corrected 
estimates. The figures in this table confirm that the bias-corrected estimates are 
generally closer to the true parameter values than the unadjusted estimates. We 
observe that, in absolute value, the estimated relative biases of the bias-corrected 
estimator were smaller than that of the original MLE for all sample sizes considered, 
thus showing the effectiveness of the bias correction schemes used in the definition 
of these estimators. 

For instance, when n = 15, the estimated relative bias of the estimators of a, 
(3, /i x , a 2 and a 2 average —0.0518 whereas the biases of the five corresponding bias- 
adjusted estimators average —0.0056; that is, the average bias (in value absolute) 
of the MLEs is almost ten times larger than that of the bias-corrected estimators. 
This suggests that the second-order bias of MLEs should not be ignored in samples 
of small to moderate sizes since they can be nonnegligible. 

We can readily see that the MLEs of a 2 and a 2 are on average far from the true 
parameter value, thus displaying large bias, for the different sample sizes considered, 
even when n = 100. This stresses the importance of using a bias correction. For 
instance, when n = 50, the relative biases of a 2 and a 2 (MLEs) were —0.0226 and 
—0.0563, respectively, while the relative biases of a 2 and er 2 (BCEs) were 0.0016 
(sixteen times lesser) and —0.0011 (fifty times lesser), respectively. Observe that 
the MLEs are always underestimating the true values of a 2 and a 2 , since their 
biases are always negatives. Note also that root mean square error decrease with 
n, as expected. Additionally, we note that all estimators have similar root mean 
squared errors. 

6 An empirical ilustration 

Next, as an empirical ilustration, consider a small data set given by Fuller (1987, 
p. 18). The data set is presented in Table 2. The data are yields of corn and 
determinations of available soil nitrogen collected at 11 sites on Marshall soil in 
Iowa. Following Fuller (1987, p. 18), we assume that the estimates of soil nitrogen 
contain measurement erros arise from two sources. First, only a small sample of soil 
is selected from each plot and, as a result, there is the sampling error associated 
with the use of sample to represent the whole. Second, there is a measurement error 
associated with the chemical analysis used to determined the level of nitrogen in the 
soil sample. The variance arising from these two sources is a 2 = 57. According to 
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Table 1: Relative biases and VMSE of uncorrected and corrected estimates for an 

errors-in-varia bles model. 
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—0.0001 


2.96 




°l 


-0.0424 


71.36 


-0.0084 


72.64 




a 2 


-0.0799 


12.83 


-0.0014 


13.04 


50 


a 


-0.0080 


5.69 


0.0002 


5.50 




P 


0.0190 


0.08 


0.0005 


0.08 




Vx 


-0.0007 


2.45 


-0.0007 


2.45 




°l 


-0.0226 


60.76 


0.0016 


61.71 




a 2 


-0.0563 


10.75 


-0.0011 


10.89 


100 


a 


-0.0025 


3.83 


0.0013 


3.78 




P 


0.0057 


0.05 


-0.0029 


0.05 




Hx 


0.0002 


1.72 


0.0002 


1.72 






-0.0131 


42.24 


-0.0009 


42.54 




a 2 


-0.0298 


7.63 


-0.0021 


7.67 



BCE: bias-corrected estimator. 
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Fuller (1987, p. 18), model (2) is a valid representation to these data. 



Table 2: Yields of corn on Marshall soil in Iowa. 







Soil 






Soil 




Yield 


Nitrogen 




Yield 


Nitrogen 


Site 


(Y) 


(X) 


Site 


(Y) 


(X) 


1 


86 


70 


7 


99 


50 


2 


115 


97 


8 


96 


70 


3 


90 


53 


9 


99 


94 


4 


86 


64 


10 


104 


69 


5 


110 


95 


11 


96 


51 


6 


91 


64 









The MLEs, the large-sample estimates of the corresponding standard errors, the 
biases and the bias-corrected estimates are given Table 3. These estimates were 
obtained using R. The figures in this table show that the biases of the estimates 
of a and (5 are much less than standard errors of the corresponding estimates. In 
cases of marginal statistical significance, biases of this maginitude could have a small 
effect on the conclusions. However, note that the MLEs of a 2 and a 2 are strongly 
biased, as evidenced by our simulations studies, i.e., they underestimate the model 
variances. Therefore, the bias-corrected estimates may be used instead of the MLEs 
to make point inferences. 



Table 3: MLEs and bias-corrected estimatives. 



Parameter 


MLEs 


S.E. 


Bias 


BCEs 


a 


66.8606 


11.7272 


-2.5334 


69.3939 


P 


0.4331 


0.1633 


0.0359 


0.3973 




70.6364 


5.0194 


0.0000 


70.6364 




220.1405 


118.1731 


-25.1946 


245.3351 


a 2 


38.4058 


20.9357 


-10.3344 


48.7402 



BCE: bias-corrected estimate. 



Figure 1 presents the scatterplot of the data together with the fitted lines ob- 
tained using the MLEs and BCEs. Notice that the line produced by the bias cor- 
rection scheme the inclination slightly attenuated and intercept increased relative 
to the non-corrected one. 

7 Conclusions 

This paper proposed a bias correction for a multivariate normal model with a quite 
general parameterization. The main result centers on models where the mean and 
the variance share the same vector of parameters. Many models are particular cases 
of the proposed model such as (non) linear regressions, errors- in- variables models, 
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Figure 1: Scatterplot for the data set together with the fitted lines. 

mixed models, factor analysis and so forth. We have shown that it is always possible 
to express the second order bias vector of the maximum likelihood estimates as an 
ordinary weighted least-squares regression. Moreover, we derived a bias-adjustment 
scheme that nearly eliminates the bias of the maximum likelihood estimator in small 
and moderate samples. Our simulation results suggest that the bias correction we 
have derived is very effective, even when the sample size is small. Indeed, the bias 
correction mechanism proposed in this paper yields modified maximum likelihood 
estimators that are nearly unbiased. We also presented an empirical ilustration that 
illustrates the proposed bias-adjustment scheme. 
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